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A main characteristic of social media is that its diverse content, copiously generated by both 
standard outlets and general users, constantly competes for the scarce attention of large audiences. 
Out of this flood of information some topics manage to get enough attention to become the most 
popular ones and thus to be prominently displayed as trends. Equally important, some of these 
trends persist long enough so as to shape part of the social agenda. How this happens is the focus of 
this paper. By introducing a stochastic dynamical model that takes into account the user's repeated 
involvement with given topics, we can predict the distribution of trend durations as well as the 
thresholds in popularity that lead to their emergence within social media. Detailed measurements 
of datasets from Twitter conflrm the validity of the model and its predictions. 



I. INTRODUCTION 



The past decade has witnessed an explosive growth of 
social media, creating a competitive environment where 
topics compete for the attention of users [H, Q • A main 
characteristic of social media is that both users and stan- 
dard media outlets generate content at the same time in 
the form of news, videos and stories, leading to a flood of 
information from which it is hard for users to sort out the 
relevant pieces to concentrate on [1, 0] ■ User attention 
is critical for the understand of how problems in culture, 
decision making and opinion formation evolve Sev- 
eral studies have shown that attention allocated to on- 
line content is distributed in a highly skewed fashion j8(- 
While most documents receive a negligible amount 
of attention, a few items become extremely popular and 
persist as public trends for long a period of time p^ - [T^ . 
Recent studies have focused on the dynamical growth 
of attention on different kinds of social media, including 
Diggl5.-i7iJ, Youtube Wikipedia and Twit- 

ter [24I. The time-scale over which content persists as a 
topic in these media also varies on a scale from hours to 
years. In the case of news and stories, content spreads 
on the social network until its novelty decays |15i |. In 
information networks like Wikipedia, where a document 
remains alive for months and even years, popularity is 
governed by bursts of sudden events and is explained by 
the rank shift model |19l] . 

While previous work has successfully addressed the 
growth and decay of news and topics in general, a re- 
maining problem is why some of the topics stay popular 
for longer periods of time than others and thus contribute 
to the social agenda. In this paper, we focus on the dy- 
namics of long trends and their persistence within social 
media. We first introduce a dynamic model of attention 
growth and derive the distribution of trend durations for 
all topics. By analyzing the resonating nature of the 
content within the community, we provide a threshold 



criterion that successfully predicts the long term persis- 
tence of social trends. The predictions of the model are 
then compared with measurements taken from Twitter, 
which as we show provides a validation of the proposed 
dynamics. 

This paper is structured as follows. In Section 2 we 
describe our model for attention growth and the persis- 
tence of trends. Section 3 describes the data-set and the 
collection strategies used in the study, whereas Section 
4 discusses the measurements made on data-sets from 
Twitter and compares them with the predictions of the 
model. Section 5 concludes with a summary of our find- 
ings and future directions. 
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II. MODEL 



On-line micro-blogging and social service websites en- 
able users to read and send text-based messages to cer- 
tain topics of interest. The popularity of these topics 
is commonly measured by the number of postings about 
these topics For instance on Twitter, Digg and 

Youtube, users post their thoughts on topics of interest 
in the form of tweets and comments. One special char- 
acteristic of social media that has been ignored so far 
is that users can contribute to the popularity of a topic 
more than once. We take this into account by denot- 
ing first posts on a certain topic from a certain user by 
the variable First Time Post, (FTP). If the same user 
posts on the topic more than once, we call it a Repeated 
Post, (RP). In what follows, we first look at the growth 
dynamics of FTP. 

When a topic first catches people's attention, a few 
people may further pass it on to others in the community. 
If we denote the cumulative number of FTP mentioning 
the topic at time t hy Nt, the growth of attention can 
be described by A^t = (1 -t- xt)Nt~i, where the Xt are as- 
sumed to be small, positive, independent and identically 
distributed random variables with mean /i and variance 
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. For small Xs^ the equation can be approximated as: 
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(1) 



Taking logarithms on both sides, we obtain log ^ — 
t 

^ Xs^ Applying the central limit theorem to the sum, 

s=l 

it follows that the cumulative count of FTP should obey 
a log-normal distribution. 

We now consider the persistence of social trends. We 
use the variable vitality, 0t = j^^, as a measurement of 
popularity, and assume that if the vitality of a topic falls 
below a certain threshold 9i, the topic stops trending. 
Thus 



log 4>t = log — = log — - log — — 

iVt_i iVo iVo 



Xt- 



(2) 



The probability of ceasing to trend at the time interval s 
is equal to the probability that <j)s is lower than a thresh- 
old value 01, which can be written as: 



p - Pr(0, < 0i) = Pr(log 0, < log(0i)) 
= Pr(x. < log(^?i)) = F{\og{e^j), 



(3) 



where F(x) is the cumulative distribution function of the 
random variable x- We are thus able to determine the 
threshold value from 9i = if we know the dis- 

tribution of the random variable x- Notice that if x. is 
independent and identically distributed, it follows that 
the distribution of trending durations is given by a ge- 
ometric distribution with Pr(i — k) = [1 — p)^p. The 
expected trending duration of a topic, E{L), is therefore 
given by 



OO 
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F(log(0i)) 



1. (4) 



Thus far we have only considered the impact of FTP 
on social trends by treating all topics as identical to each 
other. To account for the resonance between users and 
specific topics we now include the RP into the dynam- 
ics. We define the instantaneous number of FTP posted 
in the time interval t as FTPt, and the repeated posts, 
RP, in the time interval t as RPt- Similarly we denote 
the cumulative number of all posts-including both FTP 
and RP-as St- The resonance level of fans with a given 
topic is measured by fit — ^'^ptp^^* : ^nd we define the 
expected value of fit, E{fit) as the active-ratio Og. 

We can simplify the dynamics by assuming that fit is 
independent and uniformly distributed on the interval 
[l,2ag — 1]. It then follows that the increment of St is 
given by the sum of FTPt and RPt ■ We thus have 



St - St-i = FTPt + RPt - fitFTPt 



fit{Nt-Nt-i) 

= fitXtNt^i- 
(5) 
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FIG. 1: The normal Q-Q plot of log(Afio). The straight hne 
shows that the data follows a lognormal distribution with a 
slightly shorter tail. 



And also 

E^iSt) = E^iSt^i) + a,{Nt - Nt-i) = 

E^{St-2) + aq{Nt^Nt-2)^---^ (6) 
E^{S^) + aq{Nt^No)^aqNt. 

We approximate St-i by fitNt^i- Taking back to Eq. 5, 
we have 



St-fit{xt + l)Nt- 



fite^'Nt^ 



(7) 



From this, it follows that the dynamics of the full at- 
tention process is determined by the two independent 
random variables, fi and x- Similarly to the derivation 
of Eq. 3, the topic is assumed to stop trending if the 
value of either one of the random variables governing the 
process falls below the thresholds 9i and 62, respectively. 
The probability of ceasing to trend, defined as p*, is now 
given by 



p* = Pr(xt <log(0i))Pr(Ait <02) 



- 1 



-P, (8) 



p — F(log(0i)). The expected value of Lq for any topic 
q is given by 



EiLq) = 



2(a, - 1) 



F(log0i)(02 



1 



- 1. 



(9) 



Which states that the persistent duration of trends asso- 
ciated with given topics is expected to scale linearly with 
the topic users' active-ratio. From this result it follows 
that one can predict the trend duration for any topic by 
measuring its user active-ratio after the values of 9i and 
92 are determined from empirical observations. 



III. DATA 

To test the predictions of our dynamic model, we an- 
alyzed data from Twitter, an extremely popular social 
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(b)Q-Q Plot 

FIG. 2: Distribution of x over different t and all social trends. 
log(x) is following a normal distribution with mean equal to 
— 1.4522 and standard deviation equal to 0.6715. (a) The 
frequency plot of log(x). (b) The Q-Q plot of log(x) 



network website used by over 200 million users around 
the world. Its interface of allows users to post short mes- 
sages, known as tweets, that can be read and retweeted 
by other Twitter users. Users declare the people they fol- 
low, and they get notified when there is a new post from 
any of these people. A user can also forward the origi- 
nal post of another user to his followers by the re-tweet 
mechanism. 

In our study, the cumulative count of tweets and re- 
tweets that are related to a certain topic was used as a 
proxy for the popularity of the topic. On the front page of 
Twitter there is also a column named trends that presents 
the few keywords or sentences that are most frequently 
mentioned in Twitter at a given moment. The list of 
popular topics in the trends column is updated every few 
minutes as new topics become popular. We collected the 
topics in the trends column by performing an API query 
every 20 minutes. For each of the topics in the trending 
column, we used the Search API function to collect the 
full list of tweets and re-tweets related to the topic over 
the past 20 minutes. We also collected information about 
the author of the post, identified by a unique user-id, 
the text of the post and the time of its posting. We 
thus obtained a dataset of 16.32 million posts on 3361 



different topics. The longest trending topic we observed 
had a length of 14.7 days. We found that of all the posts 
in our dataset, 17% belonged to the RP category. 




^io ' ' ' ' ' ' ' "lo^ 

Log(Tlrre Step) (20 MIn) 



FIG. 3: The linear scaling relationship between _R„ and log(f) 
of topic 'Kim Chul Hee', a Korean pop star. The topic kept 
trending for 14 days on Twitter in September 2010. The num- 
ber of records that have occurred up to time t scales linearly 
with log(i). 



IV. RESULTS 

From the data-set that we analyzed, we found out that 
for a fixed time interval of 200 minutes at t = 10 Nt 
all trends follows a log-normal distribution. As can be 
seen from Figure [U the normal Q-Q plot of log(7Vio) 
follows a straight line. Different values of t yield simi- 
lar results. The Kolmogorov-Smirnov normality test of 
log(A'^io) with mean 3.5577 and standard deviation 0.3266 
yields a P-value of 0.0838. At a significance level of 0.05, 
the test fails to reject the null hypothesis that log(A^io) 
follows normal distribution, a result which is consistent 
with Equation [T] 

We also observed that the distribution of x from 
Xt — — 1. log(x) follows a normal distribution with 
mean equal to —1.4522 and a standard deviation value of 
0.6715, as shown in Figured] The Kolmogorov-Smirnov 
normality test statistic gives a high p-value of 0.5346. 
The mean value of x is 0.0353, which is small for the ap- 
proximations in Equation [1] and Equation ?? to be valid. 
We also examined the record breaking values of vitality, 
0t = Xt + 1, which signal the behavior of the longest 
lasting trends. From the theory of records, if the values 
4>t come from an independent and identical distribution, 
the number of records that have occurred up to time t, 
defined as i?„, should scale linearly with log(i) ^24. ,25l|. 
As is customary, we say that a new record has been es- 
tablished if the vitality of the trend at the moment is 
longer than all of the previous observations. As can be 
seen from Figure [3l this linear scaling is observed over a 
wide variety of topics. One implication of this observa- 
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FIG. 4: Semi-log plot of trending duration density. The 
straight line suggests an exponential family of the trending 
time distribution. The red line gives a fitting with R-square 
0.9112. 



FIG. 6: Frequency count of active-ratio over all topics. The 
maximum ratio is 1.2 among all topics. 
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FIG. 5: Density plot of trending duration in log-log scale. 
The distribution of duration deviates from a power law. 
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tion is that confirms the validity of our assumption that 
the values of xi, X2, •■■7 Xt are independent and identically 
distributed. 

Next we turn our attention to the distribution of dura- 
tions of long trends. As shown in Figure |4] and Figure [SJ 
a Hnear fit of trend duration as a function of density in a 
logarithmic scale suggests an exponential family, which is 
consistent with Eq. 4. The R-square of the fit has a value 
of 0.9112. From the log-log scale plot in Figure [SI we ob- 
serve that the distribution deviates from a power law, 
which is a characteristic of social trends that originate 
from news on social media [23j . From the distribution 
of trending times, p is estimated to have a value of 0.12. 
Together with the measured distribution of x a-nd Eq. 3, 
we can estimate the value of 6 to be 1.0132. 

We can also determine the expected duration of trend 
times stemming from the impact of active-ratio. The 
frequency count of active-ratios over different topics is 
shown in Figure [HI with a peak at = 1.2. As can be 
seen in Figure [71 the trend duration of different topics 
scales linearly with the active-ratio, which is consistent 



FIG. 7: Linear relationship between trending duration and 
active-ratio, in good agreement with the predictions of model. 



with the prediction of Eq. 9. The R-square of the linear 
fitting has a value of 0.98664. From the slope of the 
linear fit and Oi = 1.0132, and Eq. 9 we obtain a value 
for 6*2 = 1.153. With the value of 6*1 and 6*2, we are able 
to predict the expected trend duration of any given topic 
based on measurements of its active-ratio. 



V. DISCUSSION AND CONCLUSION 

In this paper we investigated the persistence dynamics 
of trends in social media. By introducing a stochastic dy- 
namic model that takes into account the user's repeated 
involvement with given topics, we are able to predict the 
distribution of trend durations as well as the thresholds 
in popularity that lead to the emergence of given top- 
ics as trends within social media. The predictions of our 
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mode were confirmed by a careful analysis of a data from 
Twitter. Furthermore, a linear relationship between the 
resonance level of users with given topics, and the trend- 
ing duration of a topic was derived. The predictive power 
of this model provides a deeper understanding the pop- 
ularity of on-line contents. Possible refinements may in- 
clude the effect of competition between topics, sudden 
burst of events, the effect of marketing campaigns, or 
any combination of them. In closing, we note that al- 
though the focus in this paper has been on trend dy- 
namics that are featured on social media websites, the 
framework and model may be suitable to other types 



of content and off-line trends. The issue raised - that 
is, trending phenomenon under the impact of user's re- 
peated involvement - is therefore a general one and should 
provide ample opportunities for future work. 
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